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quadratic convergence of the algorithm is shown under a suitable condition on the 
choice of coordinate systems. Our result extends and unifies previous convergence re- 
sults for Newton's method on a manifold. Using special choices of the coordinates, 
new numerical algorithms are derived for principal component analysis and invariant 
subspace computations with improved computational complexity properties. 
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1 Introduction 

Riemannian optimization is a relatively recent approach towards constrained optimiza- 
tion that uses full information on the underlying geometry of the constraint set in order 
to set up the optimization algorithms. The method is particularly useful if the basic in- 
gredients from differential geometry, such as the Levi-Civita connection and geodesies 
are explicitly available. This happens in many application problems arising in signal 
processing and numerical linear algebra, where optimization naturally takes place on 
homogeneous spaces, such as e.g. Stiefel or GraBmann manifolds. In this paper, we de- 
scribe a new class of Newton algorithms on Grafimann manifolds and study applications 
to eigenvalue and invariant subspace computations. 

The idea of using differential geometric methods to construct gradient descent 
algorithms for constrained optimization on smooth manifolds is of course not new and 
we refer to the textbooks [11||61IT6] for further information. Such gradient algorithms 
use first order derivative information on the function and thus can be described in 
a rather straightforward way. In contrast, Newton's method on a manifold requires 
second order information on the function, using an afhne connection in order to define 
the Hessian. This can be done in several different ways, thus leading to a variety of 
possible implementations of the Newton algorithm. 

In D. Gabay's work [5], the intrinsic Newton method on a Riemannian mani- 
fold is defined via the Levi-Givita connection, taking iteration steps along associated 
geodesies. More generally, M. Shub [14, proposed a Newton method to compute a zero 
of a smooth vector field on a smooth manifold endowed with an afhne connection. His 
algorithm is defined for arbitrary families of smooth projections np : TpM —> M, p G M, 
from the tangent bundle which have derivative equal to the identity at the base point. 
Therefore it is more general than Gabay's method and can be employed on arbitrary 
manifolds, without having to specify a Riemannian metric. In the case of a gradi- 
ent vector field on a Riemannian manifold endowed with the Levi-Givita connection, 
Shub's algorithm coincides with Gabay's, when {7i"p}pgM sue the Riemannian normal 
coordinates. 

In the PhD theses of St. Smith and R. Mahony [151113) . see also [3], the Newton 
method along geodesies of Gabay ^5 was rediscovered. However, the convergence proofs 
developed in these papers do not apply to the more general situation studied by Shub, 
except for the special case of Rayleigh quotient optimization on the unit sphere. In his 
recent PhD Thesis, P.-A. Absil [T], see also [2j, further discusses the Newton method 
along geodesies and derives a cubic convergence result in a special case. Moreover, 
variants with different projections were proposed, too. There are many more, recent 
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publications discussing aspects of Newton methods on Riemannian manifolds. We want 
to specifically mention the paper by Adler et al. [3] which is similar in spirit to this paper 
in so far as it provides explicit formulas for parametrizations and Newton algorithms 
on (SOa)^. 

In this paper, we propose a general approach to Newton's method on both Grafi- 
mann and Lagrange Grafimann manifolds that incorporates the previous ones as special 
cases, but allows also for implementations with improved computational complexity. 
We do so by replacing the family of smooth projections by an arbitrary pair of local co- 
ordinates /ip, Up with equal derivatives Dfip{0) = Di'p{0). Although this generalization 
might look minor at first sight, it is actually crucial to achieve better performance. Fol- 
lowing [7] and extending the known local quadratic convergence result for the intrinsic 
Riemannian Newton method, we prove local quadratic convergence of the generalized 
Newton algorithm. The Newton method on the Lagrange Grafimannian has not been 
considered before, but has important applications in control (e.g. to algebraic Riccati 
equations in linear quadratic control). 

The paper is structured as follows. In order to enhance the readability of the paper 
for non-experts, we begin with a brief summary of the basic differential geometry of 
the classical Grafimann manifold and the Lagrange Grafimannian, respectively, deriving 
explicit formulas for (projections onto) tangent spaces, normal spaces, gradients, Hes- 
sians, and geodesies. We then compute the Riemannian normal coordinates of the two 
types of Grafimannians. Using approximations of the exponential map via e.g. Fade 
approximants or the QR factorization, then leads to alternative coordinate systems 
and resulting simplified implementations of the Newton algorithm. By generalizing the 
construction of Shub, we introduce the Newton algorithm via a pull back/push for- 
ward scheme defined by an arbitrary pair of local coordinates for the Grafimannians. 
This leads to a rich family of intrinsically defined Newton methods that have potential 
for considerable computational advantages compared with the previously known algo- 
rithms. In fact, instead of relying upon the use of Riemannian normal coordinates, that 
are difficult to compute with, we advocate to use the much more easily computable 
local coordinates via the Q_R-factorization. 

For example, in Edelman et al. |4] the steps of the Newton algorithm on the classi- 
cal Grafimannian are defined in the ambient Euclidean space of the associated Stiefel 
manifold. This leads them to solving sequences of Sylvester equations in higher di- 
mensional matrix spaces than necessary. In contrast, our algorithms works with the 
minimal number of parameters, given by the dimension of the Grafimannian. More- 
over, our algorithms do not require the iterative calculation of matrix exponentials, 
but only involve finite step iterations using efficient QTi-computations. 

Finally, we apply these techniques to eigenspace computations. By applying our 
Newton scheme to the Rayleigh quotient function on the Grafimann (and Lagrange 
Grafimann) manifold, we obtain a new class of iterative algorithms for principal com- 
ponent analysis with improved computational complexity. For eigenspace computations 
of arbitrary, not necessarily symmetric, matrices we derive an apparently new class of 
Newton algorithms, that requires the repeated computations of solutions to nested 
Sylvester type equations. 
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2 Riemannian geometry of the Grai3mann manifold 

In this section we describe the basics for the Riemannian geometry of Grafimann man- 
ifolds, i.e. tangent and normal spaces, Riemannian metrics and geodesies. We focus 
on the real Grafimannian; the results carry through mutatis mutandis for complex 
Grafimannians, too. 

Recall, that the Grafimann manifold Grm.n is defined as the set of m-dimensional 
R-linear subspaces of R". It is a smooth, compact manifold of dimension m(n — m) 
and provides a natural generalization of the familiar projective spaces. Let denote 



On := {X £ 



|X'X = J}. 



(2.1) 



and 



SOr, 



{XGO„ldetX = l} 



(2.2) 



The Grafimann manifold can also be viewed in an equivalent way as a homogeneous 
space SOn(R)/-ff, cf. e.g. [6j and see below for a definition of H, for the transitive 
SOn-action 



(J SOn X Grm.n 

(r,v) 



Let 



Vq = colspan 







TV. 



G Gr„ 



(2.3) 



(2.4) 



denote the standard m-dimensional subspace of R" that is spanned by the first m 
standard basis vectors of R". Then the stabilizer subgroup H :— Stab(Vo) of Vq is 
given by 

If = I ^ ° G SOn 1 U G On., V G On-m | , (2.5) 

i.e. by the compact Lie subgroup of SOn consisting of all block diagonal orthogonal 
matrices. The map 

SOn /H Gr™,n, OH k-. ©Vq (2.6) 

then defines a difFeomorphism of the Grafimann manifold with the homogeneous space 
SOn /H. See Edelman et al. [4], Absil 1 and Hiiper and Trumpf 7 for further details 
on Newton's method on Gvm.n^ in a variant that exploits the homogeneous space struc- 
ture of the Grafimann manifold. Here we develop a different approach, by identifying 
Grm.n with a set of self-adjoint projection operators. 
Thus we define the Grafimannian as 



Grn 



{PG 



P,P^ = P,trP = m}, 



(2.7) 



the manifold of rank m symmetric projection operators of R"; see W for the con- 
struction of a natural bijection with the Grafimann manifold and a proof that it defines 
a diffeomeorphism. In the sequel we will describe the Riemannian geometry directly 
for the submanifold Grm.n of R"^". As we will see, this approach has advantages that 
simplify both the analysis and design of Newton-based algorithms for the computation 
of principal components. 
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We begin by recalling the following known and basic fact on the Grafimannian; see 
[SI Section 2.1] for a proof in the more general context of isospectral manifolds. Let 

Sym„ — {5 G R"""" I 5^ = 5} (2.8) 

and 

sOn ■- W G K"""" I = (2-9) 

denote the vector spaces of real symmetric and real skew- symmetric matrices, respec- 
tively. 

Theorem 2.1 (a) The Grafimannian Grm,n is a smooth, compact submanif old of Sym„ 

of dimension m{n — m). 
(h) The tangent space of Grm.n at an element P £ Gvm,n is given as 

Tp Gr„i,„ = {[p, n] \ ne so„}. (2.10) 

Here [P, O] := PH — OP denotes the matrix commutator (Lie bracket). 
Let 

adp : R"><" R"^", 

(2.11) 

adp(X) := [P,X] 

denote the adjoint representation at P. For a projection operator P it enjoys the 
following property. 

Lemma 2.1 For any P G Grm,n, the minimal polynomial of adp : R"^" — * R"^" is 
equal to — s. Thus adp = adp. Moreover, 

adp X = [P, [P, X]] = X (2.12) 

holds for all tangent vectors X £ Tp GTm,n- 

Proof From P^ = P we get 

adp X = [P, [P, X]] = P^X + XP"^ - 2PXP = PX + XP- 2PXP (2.13) 

and therefore, using P^ = P again 

adp X = P{PX + XP- 2PXP) - {PX + XP- 2PXP)P 

= PX-XP (2.14) 
= adp X 

for all n x n-matrices X. If X — [P, SI] is a tangent vector, then adp X — adp f2 = 
adp f2 = X. The result follows. □ 

We use this result to describe the normal bundle of Grm.n. In the sequel, we will 
always endow Sym„ with the Frobenius inner product, defined by 

{X,Y) —triXY) (2.15) 

for all X,Y £ Sym„. Since the tangent space Tp Grm.n C Sym„ is a subset of Sym„ 
(using the usual identification of Tp Sym„ with Sym„), we can define the normal space 
at P to be the vector space 

Np Gr,„,„ = (Tp GTm,n)^ ■■= {X G Sym„ | tr(XY) = for all F G Tp Gr™,„}. (2.16) 
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Proposition 2.1 Let P G Gr,n,n be arbitrary. 
1. The normal subspace in Sym„ is given as 



Np Gr„j,„ = {X -adlX \ X £ Sym„}. (2.17) 



2. The linear map 

is the self-adjoint projection operator onto Tp Grm.n with kernel Np Grm,n 



TV : Sym„ ^ Sym„, X ^ a.d% X ^ [P, [P, X]] (2.18) 



T 



Proof For any tangent vector [P, f2] G Tp Grrn.n, where J7 = —f2, and any X = X 
we have 

trilP, f2]{X - adp X)) =tr(([X,P] - [adpX,P])n) 

= tr(([X,P]+ad|,X)f2) ^^^g^ 
= tr((adpX-adpX)i7) 
= 0, 

since adp = adp . Therefore, Tp Grm,n and {X — adp X \ X G Sym„} are orthogonal 
subspaces of Sym„ with respect to the Frobenius inner product. Their sum also spans 
Sym„, as otherwise there exists a nontrivial S £ Sym„ that is orthogonal to both 
spaces; but then for all G sOn 

tr(S'[P, i7]) =tr([5',P]r2) = =^ [5, P] = 0, (2.20) 

and for all X G Sym„, using (I2.20p 

tr(S'(X - ad|. X)) = tr(S'X - [S, P] [P, X]) = tr(S'X) = (2.21) 

which implies S — 0, a. contradiction. Thus the two spaces define an orthogonal sum 
decomposition of Sym„ and therefore {X — adpX|X G Sym„} must be the normal 
space. This completes the proof for the first claim. 
Since tt = adp, we have 

TT^ = adp — adp = tt (2.22) 

because adp = adp. Moreover, by definition of tt we have imTr C Tp Grm.n, cf. p.lOp . 
and for any X G Tp Grm,n we have by Lemma [2.1 1 that n{X) = X. Therefore 

im7r = rpGr™,„. (2.23) 

For any X — adp X G Np GTm,n we have 

7r(X-adpX) = adpX-adpX = 0, (2.24) 

by (|2.22|) . Since Np Grm.n is the orthogonal complement to the tangent space in Sym„, 
a straight forward dimension argument yields kerTr = Np Grm,n- Finally, using the 
Frobenius inner product on Sym„ , we have for all Xi , X2 G Sym„ 

(^(Xi),X2) =tr((ad|,Xi)X2) 
= tr([P,[P,Xi]]X2) 
= tr([P,[P,X2]]Xi) 

= (Xi,^(X2)). 

Thus TT is self-adjoint and the result follows. □ 
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A formula for vr in the language of linear maps has already been given in [121 Section 
4.2]. 

There are at least two natural Riemannian metrics defined on the Grai3mannian 
Grm.n, the induced Euclidean metric and the normal metric, cf. e.g. [6] or [13] . 

The Euclidean Riemannian metric on Gr^.n is defined by the Frobenius inner 
product on the tangent spaces 

{X,Y) --triXY) (2.26) 

for all X,Y £ Tp GTm,n which is induced by the embedding space Sym„. 

The normal Riemannian metric has a somewhat more complicated definition. Con- 
sider the surjective linear map 

adp : SOn ^ Tp Grm.ra, 

2.27 

with kernel 

keradp = {i? G so„ | Pn = J7P}. (2.28) 

We regard sOn as an inner product space, endowed with the Frobenius inner product 
(i7i,J72) = tr(^22^J72) = — tr(J7i^22)- Then adp induces an isomorphism of vector 
spaces 

adp : (keradp)^ ^ Tp Gr„i,„ (2.29) 

and therefore induces an isometry of inner product spaces, by defining an inner product 
on Tp Grrn,n via 

{{X,Y))p ~^tr{^p\x)^p\Y)). (2.30) 

Note, that this inner product on Tp Grm.n, called the normal Riemannian metric, might 
vary with the basepoint P. Luckily, the situation is better than one would expect, as 
Proposition 12.31 below shows. 

But first we will show that the operator adp, P G Grm.n, is equally well behaved 
on SOn as it is on Sym„, cf. Proposition 12.11 

Proposition 2.2 Let P £ Gr„i,n be arbitrary. The linear map 

ad^ ; SOn SOn, O ^ [P, [P, n]] (2.31) 

is the self-adjoint projection operator onto (keradp)^ along keradp. 

Proof Let J7 G sOn be arbitrary. By Lemma 12.11 we know that X :— adp(J7) — 
adp(adp i7). But since for all Hi G keradp C sOn 

tT{a.dp{n)ni) = tr(adp(i7) adp(J7i)) = 0, (2.32) 

we conclude adp J7 G (keradp)^ C sOn- Now let SI G (keradp)"*". Then adp i7 G 
(keradp)"*" and hence O — adp O G (keradp)"*". By Lemma l2.ll adp(J7 — adp O) — 
adp f2 — adp i? = and hence J7 — adp i7 G keradp. It follows f2 — adp O — and 
thus adp n — O. 

We have shown imadp C (keradp)^ and that the restriction of adp to (keradp)"*" 
is the identity. It remains to show that ker adp = ker ad p , but this follows readily from 
Lemma 12.11 □ 
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Proposition 2.3 The Euclidean and normal Riemannian metrics on the Grajiman- 
man Grm.n coincide, i.e. for all P G Grm.n and for all X,Y G Tp Grm,n we have 

tr(X^y) = -tr {aAp'(X) adp\y)^ . (2.33) 

Proof Choose 

r22 G (keradp)-^ with X = [P, Di] and Y = [P, ^2]- (2.34) 

Then 

- tr(adp\x)adp\F)) = tT{nJ 02)- (2.35) 



On the other hand 



tr(X'Y)^tr{[P,n^][P,n2]) 

= tT{[P,[P,Qi\\^ Q2). 



(2.36) 



Now by Proposition 12 . 2 1 we know that adp fli — ili and this implies 

tT{x'^Y) =tj:{nj 02), (2.37) 

as claimed. □ 

Since these two Riemannian metrics on the Grafimannian coincide, they also define 
the same geodesies. Thus, in the sequel, we focus on the Euclidean metric. Note, that 
the above result is not true for arbitrary fiag manifolds and in fact, the geodesies are 
then different for the two metrics. The following result characterizes the geodesies on 

Theorem 2.2 The geodesies of Grm,n are exactly the solutions of the second order 
differential equation 

P+[P,[P,P]]^0. (2.38) 

The unique geodesic P{t) with initial conditions P{0) = Pq £ Grm.n, -P(O) — Pq £ 
Tpg Grm,n is given by 

P{t) = e*[^"'^ol Poe"*!^""^"! . (2.39) 

Proof The geodesies of Grm,n for the Euclidean metric are characterized as the curves 
P(t) G Grm,n, such that P{t) is a normal vector for all t G R. This condition is 
equivalent to the existence of S{t) = S{t)^ with 

P = 5 - adp S'. (2.40) 

Since by Lemma [2. II adp = adp this implies 

adp P = [P, P] = 0. (2.41) 

Moreover, any curve P{t) G Grm.n satisfies the identity 

P = adp P, (2.42) 
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as adp acts as the identity on the tangent space Tp Grm,n- By differentiating equation 
(|2.42p we obtain 



P = [P, [P, P]] + [P, [P, P]] + [P, [P, P]] 
= - ad^ P + adp P. 

Therefore, if P{t) is a geodesic, then adp(P) — and 



P = - a.dp P + adp P 
= - ad^ P 



(2.43) 



(2.44) 



and therefore satisfies P + [P, [P, P]] = 0, as claimed. We now check, that every curve 
P(t) as in (p39|l is a solution to (lOSl) . Let J? — [Pq, Pq]- Then 



P = [n,p], P = [n,[n,p]] 

and thus (|2.38|l is equivalent to 

[^2,[^2,p]] + [[^2,p],[[^2,p],p]] = 0. 



(2.45) 



(2.46) 



Multiplying by the left and right with e and e*^ respectively, we see that (|2.46|l is 
equivalent to 

[a, [n, Po]] + [[a, Pq], [[n, Po], Pq]] = o. 



Without loss of generality we can assume that 

Po = 

and therefore 



Im 




n = 



with z e 



and also 



X (n— m) 



Thus 



[n, [n,Po]] 



z 
-Z^ 



-2ZZ' 



2Z ' Z 



[^2,Po],[[^?,Po],Po]l 



2ZZ 







-2Z^Z 



(2.47) 



(2.48) 



(2.49) 



(2.50) 



(2.51) 



This implies (|2.46p and shows that any curve given by (f239|l is a solution of (f238|) . 
Since any Pq G Grm,ri and [Pqi-Po] G Pp Grm.n are admissible initial conditions for 
(|2.38p . and since the resulting initial value problem has a unique solution (namely 
(|2.39p \ this shows that (|2.39p is exactly the set of aii solutions of (|2.38|l . Moreover, for 
the particular initial point 

\Irn 01 







one observes that 



[[n,Po],[[n,Po],Po]] 



2ZZ 



T 







-2Z ' Z 



(2.52) 



(2.53) 
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is a normal vector to the Grafimannian at Pq. Thus, by invariance of the normal 
bundle under orthogonal similarity transformations Pq > Pq0, O G SOn, we see 
that [P, [P, P]] is a normal vector to Tp Gxm.n for all P G Tp Grm.ji- Thus, for any 
solution P{t) of (|2.38|) also P — —[P, [P, P]] is a normal vector, and hence all solutions 
of (|2.38p are geodesies. □ 

The above explicit formula for geodesies leads to the following formula for the 
geodesic distance between two points on a Grafimannian. We omit the simple proof; 
see also [I] for a slightly different formula which is only valid on an open and dense 
subset of the Grafimannian. 

Corollary 2.1 Let P,Q e Gvm.n- Given any G SOn such that 



we define 



P = o' 



Hi '^12 

^12 Q22, 



/m 0' 








0Q0 ' 



Let 1 > Ai > • • • > Am > denote the eigenvalues of Qi\. The geodesic distance of P 
to Q in Gim,n is given by 



2 arccos^(\/Aj) 
\j i=l 



dist(P,Q) = 

Alternatively, let 1 > fii > ■ ■ ■ > fin-m > denote the eigenvalues of Q22- Then 



(2.54) 



dist(P, Q) = 



\ 



2E 



YY^,Y^Y 



In particular, if P,Q € Grm,n with 

idist^(P,Q) =tr (^arccos^(CK^PF)5)^ 



then 



(2.55) 



(2.56) 



Note that formula (|2.55p is more efflcient in the case 2m > n. Note also that our 
formulas imply that the maximal length of a simple closed geodesic in Grm,ri is \/2m-Tv 
for 2m < n and •y/2(n — m) • tt for 2m > n. 



2.1 Parametrizations and Coordinates for the Grafimannian 

In this section we briefly recall the notion of local parametrization for smooth manifolds. 
For further details we refer to [10]. Let M be a smooth n-dimensional real manifold 
then for every point p £ M there exists a smooth map 

(Up : R" — > M, /ip(0) = p 

which is a local diffeomorphism around G R". Such a map is called a local parametriza- 
tion around p. 
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We consider local parametrizations for the Grafimaiiriian via the tangent space, i.e. 
families of smooth maps 

flp : Tp Grm,n Grm,n (2-57) 

satisfying 

/ip(0) = P and D^p(O) = id. (2.58) 
We introduce three different choices of such local parametrizations. 

2.1.1 Riemannian normal coordinates 



Riemannian normal coordinates are defined through the Riemannian exponential map 
(see e.g. [8]) 



Hp^ = expp : Tp Gr„i,„ Grm,„, 

expp(C)=e[«'^lpe~[«'^l 



(2.59) 



Remark 2.1 Note that by Theorem l2.2l the unique geodesic P{t) with initial conditions 
P(0) = Po and P(0) = Po is given by P{t) = expp„(tPo) = n°p^itPo). 



Obviously, expp is smooth with 



and 



expp(O) = P 
Dexpp(O) — id. 



(2.60) 
(2.61) 



Dexpp(0)e=K,P],Pl 
= adp ^ 



(2.62) 



for all ^eTp GTm,n ■ 



Such Riemannian normal coordinates can be explicitly computed as follows. Given any 
e G SO„ with 



we can write 



P = © ' 



[e,P] = ©^ 



Ira 








Z' 

-Z^ 



with Z G 



X (n— m) 



Since 



i-zz') 





T \m 



e 





T ry'\7n 



i-z'z) 



e 



we obtain 



e = e' 



cos v/ZZT Z 2iiL^S 



0. 



(2.63) 
(2.64) 

(2.65) 
(2.66) 
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Here, as usual, it is understood that 



^sm^2=Z(Z^Z)-isin(Z^Z)^ 



oo 
oo 

= E 



(2j + l)! 

(2i + l)! 
(-l)'(ZZ"^)' 



2i 



i=0 
sin VZZ^ 



Therefore 



expp(^) = 6> 



cos VZZT 

sin VZ^Z 



Vz^ 

cos^ VZ^T 



[cos\/zF' 



(2i + l)! 
Z. 



iVz 



s/Z 



cos^ V^^T 
-Z^ sine f 2\/^ 



-cosV^ZTsin^I z 
sin^ V^T^ 



sine (2V^ZT] ^ 



24ZZ^^ 



sine (^2VZZi 
-Z^ sine {2\fZZ^^ sin (^2VzTzj 



6> 



(2.67) 



(2.68) 



QR- coordinates 
We define Q^i-coordinates by the map 

H"^'"- : Tp Gr 

i^{I+[LP])QP{{I+[LP])Q) 



(2.69) 



Here Mq denotes the Q-factor in the Qi?-factorization M = MqM^ of M. Note that 
the matrix 

/ Zl 



7 + [e,p] = 



-Z' I 







(2.70) 



is always invertible and therefore the Q-factor (/ + P])q G On(M) exists, and more- 
over, is unique if the diagonal entries of the upper triangular factor R are chosen 
positive. From now on, we always choose the i?-factor in this way. Actually, the deter- 
minant of the Q-factor, 

det(7+K,P])Q = l, (2.71) 
i.e., (7+ [^,7'])q £ SOra(R), as it is easily checked that det ^-z^ / ] ^ always. 
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Moreover, by the smoothness of the Q_R-factorization for general invertible matrices 
(follows from the Gram-Schmidt procedure rather than from the usual algorithm via 

OR 

Householder transformations), the map /ip is smooth on the tangent spaces Tp Gvm.n 
with 

fj,f^(0) = P. (2.72) 

Now by (f2:72ll 

D/x^^(0) iTpG (2.73) 

and a straightforward computation shows that D ^p^(O) = id. In fact, by differentiat- 
ing the Qi?-factorization 



we obtain 
and therefore at t — 



I + P] = Q{t)R(t), Q(0) = /, i?(0) = I 
[C, P] = QR + QR 



K,P] = Q(0)+7?(0). 



(2.74) 

(2.75) 
(2.76) 

But [5, P] and Q(0) are skew-symmetric, while -R(O) is upper triangular. Thus R(Q) = 
and therefore 

^AI + t[i.P])Q =Q(0) = [C,P]. (2.77) 

This shows 

DM?^(0)C=[[C,P],Pl=e (2.78) 
for all ^ G Tp Gvrn.n, as claimed. 

There exist explicit formulas for the Q and 7?-factors in terms of Cholesky factors. 
In fact, with 

r/m Ol 



and since 



satisfies 



P = 0' 



©(/-f [C,P])e' = 



xx^ = x^x = 



^ 

— Z^ In—r. 



= : X 







/„_,„ + z'^z 

— XqXyi is obtained as 



the Q_R-factorization of X 



and 



R^-^ ZR^^ 



-Z' R^^ R22 



Rii 

. R22_ 

Here Rn and R22 are the unique Cholesky factors defined by 



^11^11 — + ZZ , 

R22R22 = In-m + Z^ Z. 



(2.79) 
(2.80) 
(2.81) 

(2.82) 
(2.83) 

(2.84) 



Note, that vanishing of the 12-block of Xr follows from the invertibility of Rn and 
R22 and equation (|2.81|l . 
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2.1.3 Cayley coordinates 

Another possibility to introduce easily computable coordinates utilizes the Cayley 
transform. For any skew-symmetric matrix O the Cayley transform 



Cay : sOn SOn, 

{21 + n){2i - n)~^ 

is smooth and satisfies D Cay(O) = id. The Cayley coordinates are defined as 



A*p*y :TpGr„, 



^^Cay([^,P])PCay(-K,P]). 



(2.85) 



(2.86) 



The above mentioned property of the Cayley transform implies that ji^^ is smooth 

and satisfies 



DMp"''(0)C = D Cay(0)([e, P])P - PD Cay(0)K, P] = [[C, P], P] = C 

Cay/ 



(2.87) 



for all tangent vectors ^ £ Tp Grm.n- Moreover, fXp (^) is easily computed as follows. 
For 



p = e' 



Im 





0, i = e' 



-z 

-Z^ 



6> 



(2.88) 



a straightforward computation shows, using Schur complements and properties of the 
von Neumann series, that 



Cay 



Z 

-z'^ 



2/m Z 

Z llji—m 

2/m Z 

~Z ^Iji — jYl 



2Im — Z 

Z 2In — r 



\Z{In-m + \Z^ Z) ^ 



~\Z'^ [Im-\-\ZZ'^) ^ ^[In—m'^'^Z'^Z) ^ 



Izz^ 



-z' 



-^7'^ 7 



Im ~\-\ZZ~^ 



In—m'\'\z'^Z 



Therefore, 



M?^(O = 0^ Cay 



-Z^ 



rZZ 



Im 





Cay 







e 



1 \ ~^ 
Im + -zz^ \ [z™ - Izz'^ -z\ 0. 



(2.89) 



Now 



Izz^ 



Im + ^ZZ^ 



(2.90) 



is a basis matrix with orthonormal columns and therefore /Up*'^(^) is exactly the pro- 
jection operator associated with the linear subspace 



colspan 



-z' 



(2.91) 
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2.1.4 Approximation properties of parametrizations 

We have already shown that 

DMpP(O) = Dm^^''(0) = DpQ«(0) = id. 
Moreover, it holds 

Theorem 2.3 Let P G Gr.m,n and [£,,P] as in (|2.63p and (|2.64p . respectively. Then 



(2.92) 



d ,.exp 

37^ 



le=0 



e=0 



e=0 



-2ZZ ' 
2Z^Z 



0. (2.93) 



Note, that the right hand side of p.93[l is independent of the choice of in (|2.63[) . 
Proof Taking derivatives yields 

3&^r(^6L^^=^e[^«^-lPe-[^«-ll 



le=0 



= K,P]'P + PK,P]'-2[C,P]PK,P] 



= 0' 



-2ZZ ' 
2Z^ Z 



(2.94) 



From the theory of Pade approximations it is well known that for all matrices 
X e sOn and e G R 



e''^ = (2/„ + eX){2In - eX)-'^ + ©(e^). 



Consequently, 



e=0 



-2^^ ' 
2Z^Z 







(2.95) 



(2.96) 



holds as well. 

We now proceed with fip^. Let e G R be a parameter and let 



X{£) :=e(j + e[e,p])e^ 



eZ 



-eZ' In 



with QP-factorisation 



X{e) 



X(£) = X(e)QX(e)R, 



where the Cholesky factors Ra are defined via 

rIi{£)Rii{£) = hn+e^ZZ'^ , 



Obviously, 



-R22(e)i?22(e) = In-m+e^z'^Z. 



Pll(0) = /m and P22(0)=/„ 



(2.97) 
(2.98) 
(2.99) 

(2.100) 
(2.101) 
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Therefore, taking the first order derivatives in (|2.100|l and evaluating at e = gives 



iiA(0)+i?ii(0) = 0, 

^j2(0)+i?22(0)=0, 



(2.102) 



which imply Rii{0) — and -R22(0) = 0. Furthermore, taking second order derivatives 
at e = and using (|2.10Hl and (|2.102|) gives 



R{i{0) + Rii{0) ^2ZZ' 



R22{0)+R22iO)=2Z' Z. 
Using (|2.10ip , (|2.102p and (|2.103p we compute the derivatives of the inverses as 



(2.103) 



and 



Therefore, 



^RnHe\^^ = -Rii{0), 
^R-;(s)\^^^^-R22{0). 



(2.104) 



(2.105) 



^(0)q 



de 



e=0 



e=0 



z 

-Z^ 

-i?ii(o) 

-/?22(0). 



(2.106) 



and finally, 



e=0 ^ ' 



0' 



^i?ii(0)-i?A(0) 




2Z^Z 



-2ZZ' 







2Z' Z 







(2.107) 



as required. 
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2.2 Gradients and Hessians 

Let F : Sym„ — » R be a smooth function and let / := F\q,-^ ^ denote its restriction 
to the Grafimannian. Lot \/p{P) £ Sym„ be the gradient of F in Sym„ evaluated at 
P. Let Hp'(P) : Sym„ x Sym„ —^ E denote the Hessian form of F evaluated at P. We 
also consider Hessi?(P) : Sym„ — » Sym„ as the corresponding linear map. Gradient 
and Hessian are formed using the Euclidean (Frobenius) inner product on Sym„ . 

The next result computes the Riemannian gradient and Riemannian Hessian of 
the restriction / with respect to the induced Euclidean Riemannian metric on the 
Grafimannian (and thus also for the normal Riemannian metric). 



Theorem 2.4 Let f : GTm,n K- The Riemannian gradient, gtaO^-, and the Rie- 
mannian Hessian operator 9)tsSf{P) : Tp GTm,n — » Tp GTm,n are given as 

gvaOfiP) = ad%{VF{P)) = [P, [P, Vf(P)]], (2.108) 



and 

9)tsSf{P){0 = aAp (HessF(P)(0) -adpadvp(P)e, (2-109) 
for all^ e TpGrm,n- 



Proof The first part follows immediately from the well known fact, that the Riemannian 
gradient of / coincides with the orthogonal projection of Vjr onto the tangent space 
Tp GTm,n- Since the orthogonal projection operator onto the tangent space is given by 
adp, this proves the first claim. 

For the second part, consider a geodesic curve P{t) with P(0) = ^. Then P = 
— [P, [P,P]] and therefore the Riemannian Hessian form is 



%(P(0))(e,0 :=^!i^[^^ 

= HF(P(0))(e,O + DP(P(0)) P(0) 

= H^(P(0))(e,O - DP(P(0)) K, K,P(0)]] 

= H^(P(0))(^, - tr (V^(P(0)) K, K, P(0)]]) . 



Thus by polarization 



%(P)(^,»?) = HF(P)(e,»7)-^r(VF(-P) [e,[7?,P]])- ^r(Vir(P) [v,[i,P]])) 
= tr((HesSf (P)(0 - ^[P, [VF(P),e]] - ^K, ^^1, Vf(P)]) rj) 
= tr ((Hessf (P)(0 - |[P, [Vf{P),^]] - 5[Vf(P), [P,^]) v) 

= tr (ad|, (HessF(P)(0 - hiP^ [Vf(P),^]] - ^[Vf(P), [P,i]])v)- 

(2.111) 
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This implies that the Riemannian Hessian operator is given as 

iDefiS/(P)(C) = adp Hessp (P) (C) - ladpadvy(p) C - ^advj,(p) adp 

= adp (^Hcssp(P)(^)^ - i adpadvp(p) C - ^ adpadvp(p) adp^ 
= ad| (^Hessp(P)(0) - adp advp(p) C - i ad|,[advj,(p), adp]^ 
= adp ^Hessp(P)(0) - adp advp(p) ^ - 5 adp[advp(p), adp]^ 
= adp ^Hessp(P)(0^ - adp advp(p) ^ - 5 adp ad[Vp(p),p] 

(2.112) 

The result thus follows from the following lemma. □ 

Lemma 2.2 For any tangent vector ^ G Tp Grm.n and any A £ Sym„ one has 



adp ad[^ p] ^ = 0. 
Proof Without loss of generality we can assume that 



(2.113) 



P = 



Im 





5 = 



z 




A = 



Ai A2 

[aJ As\ 



(2.114) 



Then 



and 



-A2 
Aj 



ad I 



[A,P] ^ - 



-ZAJ - A2Z^ 



A^Z + Z' A2 



from which adp adj^ ^j ^ = follows by a straightforward computation. 



(2.115) 



(2.116) 



n 



As a consequence wc obtain the following formulas for the Riemannian gradient 
and Riemannian Hessian operator of the Rayleigh quotient function. 

Corollary 2.2 Let A £ Syni„ . The Riemannian gradient and Riemannian Hessian 
operator of the Rayleigh quotient function 



f : Grm,n ^ K, /(P) := tr(AP) 



(2.117) 



0raO^(P) = [P,[P,A]], 
SjtsSf{P) = — adp o ad^, 



(2.118) 



respectively. 



19 



As another example, let us consider the function 
F : Sym„ -> R, 

o T (2.119) 

P i-» 11(7 - P)APf = tr(7 - P)APA^ 

for an arbitrary, not necessarily symmetric matrix A € R"^". Note that the global 
minima of the restriction / := P|Gr^ „ to the Grafimannian are exactly the projection 
operators corresponding to the m-dimensional invariant subspaces of A. The gradient 
and Hessian operator on Sym„ are computed as: 



■^F{P + £H) 
as 



= tT{-HAPA'^ + P)AHA'^) 

£=0 

= trH{A'^{I - P)A- APA'^), (2.120) 



d2 



-^F{P + eH) 



de 

by polarisation for H,K € Sym„ 



= -2tTHAHA ' , 

£=0 



^{-2tT{H + K)A{H + K)A\2{H - K)A{H - K)A^) = -trH{AKA'^+ a'^KA) 
consequently, 



Vp(P) =A'^(I- P)A - apa'^, 

\ T (2.121) 

Hcssp{P){^) = -A^^A - A^A^. 

This leads to the following explicit description of the Riemannian gradient and Rie- 
mannian Hessian operators on the Grafimannian. 

Corollary 2.3 Let A € R"^". The Riemannian gradient and Riemannian Hessian 
operator of 

f : GTm,n ^ R, /(F) := 11(1 - P)APf (2.122) 



QtadfiP) = [P, [P, a'^A - a'^PA - APA^]], 
SjtBBfiPm = -[P, [P, a'^^a + a^a'^]] - [P, [A'^A - a'^pa - APA'^, ^]], ^^'^^^^ 
respectively. 



3 Geometry of the Lagrange GraBmannian 



In this section we develop am analogous theory for the manifold of Lagrangian subspaces 
Thus we consider the Lagrange Grafimann manifold LGn, consisting of all n- 



T,2n 



dimensional Lagrangian subspaces of R^" with respect to the standard symplectic form 



/„ 
-In 



(3.1) 



Recall, that an n-dimensional subspace V C R^" is called Lagrangian, if 



Jv = 



(3.2) 
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for all V £ V. Instead of interpreting the elements of the Lagrange Grafimann manifold 
as maximal isotropic subspaces, we prefer to view them in an equivalent way as a certain 
subclass of symmetric projection operators. Note that, if P is the symmetric projection 

operator onto an n-dimensional linear subspace V, then the condition PJP = is 
equivalent to V being Lagrangian. Thus wo define the Lagrange Graflrnannian 

n2 



LGn ■- {P e Sym. 



2n 



P,ttP = n, PJP = 0} 



(3.3) 



as the manifold of rank n symmetric projection operators of R satisfying the La- 
grangian subspace condition PJP = 0. Note, that LGn is a compact, connected sub- 
manifold of the Grafimannian Gr„^2n- In order to obtain a deeper understanding of 
the geometry of this set, we observe that LGn is a homogeneous space for the action 
of the orthogonal symplectic group. Let 

GL„ :={X eK"''"|detXy^O} (3.4) 

and let 

0Sp2„ - {T e GL2„ I JT = J, T G SOsn} (3.5) 
denote the Lie group of orthogonal symplectic transformations. Let 

osp2„ = {X e S02„ \ J + JX ^ 0} (3.6) 

denote the associated Lie algebra of skew-symmetric Hamiltonian 2n x 2n-matrices. 
Thus the elements of osp2„ are exactly the real 2n x 2n-matrices T of the form 



A -B 
B A 



(3.7) 



defined by the condition, that A + tB £ Un, i.e. A + iB is skew-Hermitian, i.e., A E sOn 
and B € Sym„, where 

Un := {X e C"''"|X* = -X} (3.8) 

and the asterisk symbol denotes complex conjugate transpose and i := \/— T. Similarly, 
the elements of OSp2„ are the real 2n x 2n-matrices ^ of the form 



rx -Y] 

£^:=elY X I satisfying X G so„, Y £ Sym„ . 
In particular, 0Sp2„ is isomorphic to the unitary group 

u„ - {X e c"^"|x*x = /„}. 

The orthogonal symplectic group 0Sp2„ acts transitively on LGn via 

a : OSp2n X LGn LGn, 
(T, P) ^ T^PT, 

with the stabilizer subgroup of 



P ■= 



I 0' 




e LGn 



given as the set of all block-diagonal matrices 



A 0' 
A 



with As On- 



(3.9) 



(3.10) 



(3.11) 



(3.12) 



(3.13) 



Therefore LGn is a homogeneous space that can be identified with Un/On. 
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Theorem 3.1 (a) The Lagrange GrafimannianLGn is a smooth, compact submanifold 

of Sym2„ of dimension "^"2"^"*"^ • 
(b) The tangent space o/LGn at an element P G LGn is given as 

Tp LGn = {[P, n] \ osp2„}. (3.i4) 

Since the tangent space Tp LGn C Sym2„ is a subset of Sym2„, we can define the 
normal space at P to be the vector space 

Np LGn = {Tp LGn)-^ := {X £ Sym2„ j tr(XF) = for aU Y e Tp LGn}. (3.15) 



Proposition 3.1 Let P G LGn be arbitrary, 

1. The normal subspace in Sym2n at P is given as 

Np LGn = [x-^ &dl{ JXJ + X) \ X £ Sym2„| . (3.16) 

2. The linear map 



7!" : Sym2„ -> Symjn, 

X^ ^[P,[P,JXJ + X]] 

IS a self-adjoint projection operator onto Tp LGn with kernel NpLGn- 



(3.17) 



Proof To prove the first statement let O G osp2n' X G Sym2„ and P G LGn be 

2 



arbitrary. Then for [P, n] G Tp LGn and X - i a.dj,{JXJ + X) e Np LGn we have 



tr(^[P,n] (^X - ^adpiJXJ + X)^^ =tr(^r2 (^^ a.d%{JXJ + X) -adpX^^ 

= tr{n{^adp{JX.J + X) -adpX))) 
= ^tr(J7 adp(JXJ-X)) (SIS) 
= tr{nP{JXJ - X)) 
= tr {Xn{JPJ ~ P)) 
= 0, 



where we have used Lemma 12.11 O being skew-symmetric and Hamihonian, and the 
easily verified identity 

JPJ -P = -/2n, for aU P G LGn . (3.19) 

By (I3.18|l , Tp LGn and Np LGn are orthogonal subspaces of Sym2n with respect to 
the Frobenius inner product. Analogously to Proposition 12. II we now see that Sym2n = 
Tp LGn @Np LGn holds true as well: 
Every P G LGn can be written as 



In 




(3.20) 
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for some Q £ 0Sp2„. Note that for all il £ osp2„, and for all X,S £ Sym2„ 

tr {s[p, n])=tT (qsq^ [i'So]' Q^Q^] ) > 

tr(s(x-^iidj,{JXJ+X))) = tT{QSQ\QXQ^~ ^a&l ^.{.JQXQ^J+QXQ^))) 



OJ 



(3.21) 



By (|3.2ip and Sym2„ ^ (5(Sym2„)(5^ being an isomorphism, without loss of generality 
we might assume that 

\ln Ol 



P = 







(3.22) 



Assume there exists an 5 G Sym2„ being orthogonal to both subspaces. We will show 
the implication 



tr 



tr (S'adp n) — Q for all O G osp2„ 
{S{X ~ i a.A^p{JXJ + X)) = for aU X £ Sym2„ 



S = Q. 



Partition S £ Sym2„ as 



then 



'S'li 512 

•S'l2 'S'22. 



(3.23) 



(3.24) 



tr (S'adp J?) = for all Q £ osp2„ 512 £ so„, (3.25) 

where we have used the symmetry of the (12)— block of O, see (13. 6p and (|3.7p . Moreover, 



tr {S{X - i ad|,( JX J + X)^ = tr {sX - i(ad|, S){JXJ + X) 

= tr(\ 

\l 2 

Therefore, 



oil 2 

'S'22 



X 



(3.26) 



tr ( S{X - i a.dj,{JXJ + X) ) = for all X £ Sym^ 



S12 £ Sym„, 
5ii = S22 = 0. 



(3.27) 



Together with (|3.25p we conclude S — 0, i.e., Sym2„ — Tp LGn (BNp LGn as required. 

Now we prove the second claim. By the same reasoning as above we again might 
assume that 



Let 



Since 



X 







Xii Xi2 
Xi2 X22_ 



■k{X) = \ a.d^p{JXJ + X) = 



2 



(3.28) 



(3.29) 



(3.30) 
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we see that ■k'^{X) = t^{X) and moreover irnvr = Tp LGn . For any X — ^ adp(JXJ - 
X) £ Np LGn we have 



■k[x -\ adj^iJXJ + X)^ = -!v{X) - 7r^(X) = 0, 



(3.31) 



and by counting dimensions ker n — Np LGn • Finally, for all X,Y £ Sym2n and by 
using (|3:30)) 



{tt{X), Y) = tr adp{JXJ + X)Y 



■tr 



■tr 





X12+X 

Xu X12 
X12 X22 



12 



X12 + ^12 




y22. 



^12 + ^12 

yi2 + 



(3.32) 



= tr [X \&A],{JYJ + Y) 
= {X,niY)}. 
Thus n is self-adjoint and the result follows. 



Fortunately, the discussion of Riemannian metrics carries directly over from the 
Grafimannian case to the case of the Lagrange Grafimannian. We therefore omit the 
proof. 

Consider the surjective linear map 



with kernel 



adp : osp2„ ^ Tp LG„, 

n ^ [p, n] 

ker adp = {i? G osp2„ j Pf^ = HP}. 



(3.33) 



(3.34) 



We regard osp2n as an inner product space, endowed with the Frobenius inner product 
(i7i, 02} ~ tr{il^ 02)- Then adp induces an isomorphism of vector spaces 



adp : (keradp) ^ Tp LGn 



(3.35) 



and therefore induces an isometry of inner product spaces, by defining an inner product 
on Tp LGn via 

{{X,Y))p ~ -tr(Sdp\x)i^dpV)) (3.36) 



called the normal Riemannian metric. 



Proposition 3.2 The Euclidean and normal Riemannian metrics on the Lagrange 
Grafimannian LGn coincide, i. e. for all P £ LGn and for all X,Y £ Tp LGn we have 

tr(X^y) = - tr (aAp^iX) adp\y)^ . (3.37) 

Since a solution to (|2.38p with an initial value Pq £ LGn and Pq G Tp^ LGn is 
fully contained in LGn , and since geodesies are unique, the geodesies of LGn are also 
described by that equation. 
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Theorem 3.2 The geodesies of LGn are exaetly the solutions of the seeond order 
differential equation 

P+[P,[P,P]]^0. (3.38) 

The unique geodesic P{t) with initial conditions P(0) — Pq £ LGn, ^'(0) = Pq £ 
LGn is given by 

P{t) = e*!^"'^"! Po e-'I^O'-f""! . (3.39) 
We now consider local parametrizations for the Lagrange Grafimannian as well 

MP : Tp LGn LGn (3.40) 

satisfying 



/ip(0) = P and D^ip(O) = id. 



(3.4f) 



3.f Parametrizations and coordinates for the Lagrange Grafimannian 
3.1.1 Riemannian normal coordinates 

As before Riemannian normal coordinates are defined through 
expp : Tp LGn ^ LGn, 



expp(e) = e[«'^lpe-[«'^l 



Given any £ OSp2n with 



p = e 



In 








and 



We obtain 



expp(0 = 



Z' 
-Z 

cos Z 
— sin Z 



0, Z G Sym„ 



cos Z ~wiZ\0. 



(3.42) 

(3.43) 
(3.44) 
(3.45) 



3.1.2 QR- coordinates 

Let P and P] as in (|3.43|l and (|3.44|l . We define smooth QP-coordinates by the map 



^QR . j,^ LGn ^ LGn, 
Analogous to Section [27L2] we define 



(3.46) 



(3.47) 



where R denotes the unique Cholesky factor that solves R R = I + Z . Note that the 
Q-factor in p.47|) is orthogonal and symplectic. The above map (|3.46p is therefore well 
defined. 



'In Z' 




R-^ ZR-^' 


_~Z In_ 


O 


~ZR^^ R-^ 
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3.1.3 Cay ley coordinates 



Let P and P] be as in (I3.43P and (|3.44p . For any skew-symmetric Hamiltonian matrix 
S7 the Cayley transform 



Cay : osp2„ ^ 0Sp2„, 

{21 + n){2i- J7)"^ 

is smooth and satisfies D Cay(O) = id. The Cayley coordinates are defined as 
/^p ^ : Tp LGn — >■ LGn, 



CH^Cay([C,P])PCay(-[C,P]). 



(3.48) 



(3.49) 



Therefore, 



r 1 ^21 



/„ + ] [In - -Z] 0. (3.50) 



3.2 Gradients and Hessians 

Let F ■ Sym„ ^ R be a smooth function and let / := -F|lg„ denote its restriction to the 
Lagrange Gral3mannian. Let V p{P) G Sym„ be the gradient of F in Sym„ evaluated 
at P. Let }ip{P) : Sym„ x Sym„ — ^ R denote the Hessian form of F evaluated at P. 
We also consider }lessp{P) : Sym„ Sym„ as the corresponding linear map. Gradient 
and Hessian are formed using the Euclidean (Frobenius) inner product on Sym„. 

The next result computes the Riemannian gradient and Riemannian Hessian of the 
restriction / with respect to the induced Euclidean Riemannian metric on the Lagrange 
Grafimannian (and thus also for the normal Riemannian metric). 

Theorem 3.3 Let f : LGn R and consider the orthogonal projection operator n 
as defined by (|3.17[) . The Riemannian gradient, graOj, and the Riemannian Hessian 
operator S)es5f{P) : Tp LGn — > Tp LGn are given as 

0xadf{P)=n{Vp(P)) = iad|, p(P)).J + V p{P)) , (3.51) 

and 

iDess^(P)(0 = ladf, (j(HessF(P)(C))j + HessF(P)(C)) 

- iad|, (^j(^adpadvj^(p) ^^J + adpadv^(p) 



(3.52) 



for all ^ €Tp Gvm.n. 

Proof The first part follows again from the fact, that the Riemannian gradient of / 
coincides with the orthogonal projection of Vp onto the tangent space Tp LGn- 
Analogously to the proof of Theorem 12.41 

%(J'(0))(C,C)=Hp(P(0))(C,O-tr(Vp(P(0)) [e,K,P(0)]]). (3.53) 
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Thus by polarization 

^ / (P) (C, »?) = tr ( (Hess;. (P) (?) - ^ [P, [Vf (P) , C]] - ^ [Vf (P) , [P, ?]) 77) 

= tr (^(Hessj.(P)(0 - l[P, [V^ (P),?]] - ^[Vf (P), [P,C])^). 
This implies that the Riemannian Hessian operator is given as 
i5ess/(P)(0 = 7r(^HesSi^(P)(C) - i adp adv^(P) ? - ^advp(p) adp 
= 'r(Hessp(P)(C) - adp advj,(P) ? - 5 ad^y^ (p),p] c) 
= 'r(Hessp(P)(^)^ - 7r(^adpadv^(P) ^ ^^(^1 ad[v^(p) pj 



(3.54) 



(3.55) 



Together with Lemma 12.21 the Lemma 13. II below implies 

^(^ad[Vi=.(P),P]C) = 0. 

The result follows. 

Lemma 3.1 For any tangent vector ^ G Tp LGn and any A G Sym„ one has 

adp(J(ad[^ P] ^)J) = 0. 
Proof Without loss of generality we can assume that 

'Ai A2 



7m 0' 




'0 


Z 





Z 






Then 



-AJ Z - 



from which adp(J(ad[^ pj ^) J) = follows by a straightforward computation. 



(3.56) 

n 

(3.57) 
(3.58) 

(3.59) 

□ 



As a consequence we obtain the following formulas for the Riemannian gradient 
and Riemannian Hessian operator of the Rayleigh quotient function used to com- 
pute the n— dimensional dominant eigenspace of a real symmetric Hamiltonian (2n x 
2n)— matrix. 

Consider the set of real symmetric Hamiltonian (2n x 2n)— matrices p2,i 







'S T ' 




P2n := \h G Sym2„ 


H = 


T ~S 


, T,S e Sym„ 



Note that 



JKJ = K for all K € p2n 



(3.60) 
(3.61) 



Moreover, from the theory of Cartan decompositions, see e.g. [9], the following com- 
mutator relations are well known 

[p2n,P2n] C 0Sp2„, [p2n , 0Sp2„l C p2n (3.62) 

together with the isomorphisms 

{0Sp2,i)Q = 0SP2n: Q^(p2n)Q = p2n for all Q G OSp2„ . (3.63) 
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Corollary 3.1 Given H £ p2n Let 

F ■■ Sym2„ 

with restriction 



P !-» tr{HP), 



The Riemannian gradient and Riemannian Hessian operator are 

0rac)y(P) = [P,[P,^]], 
f)essy(P) = — adp o ad^, 

respectively. 

Proof Because the function F is linear the Euclidean gradient is simply 

and the Euclidean Hessian operator vanishes 

Hessf^(P) = 0. 

We therefore get for the Riemannian gradient using (|3.6ip and Theorem 13.3 
flrac)y(P) = ^ad|. (j(Vf(P)) J + (P) 
= i ad|, (^JHJ + H 



(3.64) 
(3.65) 

(3.66) 



(3.67) 
(3.68) 

(3.69) 



For the Riemannian Hessian operator we need some preparation. Let ^ = [P, n] G 
Tp LGn be arbitrary, i.e., O G osp2„ is arbitrary. Then there exists a Q G OSp2„ such 
that 





(3.70) 



But the commutator in p.70p is an element of P2n and therefore by (|3.63p the same 
holds true for ^ independent of P. Consequently, ad/f ^ G ''Sp2n ^-^id adp ad/f ^ G p2n, 
and finally J{aAp 'aAh = adp ad// ^. The formula for the Riemannian Hessian 
operator is now easily verified 



(3.71) 



5iess/(P)(0 = -^adp (^J(adpadv^(p) ^)J + adpadv^(p)^ 
= — ^ adp ^J(adp ad/f ^) J + adp ad/f 
= — adp ad/f ^. 



n 



4 Newton's method 

In the following we propose a class of Newton-like algorithms to compute a nonde- 
generate critical point of a smooth cost function / : Grm.n Local quadratic 

convergence of the proposed algorithm will be established. Parts of this section are 
based on the conference paper [?]• 
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4.1 The Euclidean case 

Let / : R" — > R be a smooth function and let x* £ R" be a nondegenerate critical 
point of /, i.e. the Hessian operator Hessy(2;*) is invertible. Newton's method for / is 
the iteration 

xo e R", Xk+i = Nf{xi,) ■- Xk - (Hess/(xfe))~^ ^ f{^k)- (4-1) 

Note that the iteration (|4.1[1 is only defined if Hess^(a;fe) is invertible for all fc G No = 
Nu{0}. However, since / is smooth, there exists an open neighborhood of x* in which 
the Hessian operator is invertible. 

It is well known that the point sequence {xi^^i^^f^^ generated by (|4.ip is defined 
and converges locally quadratically to a;* provided that xq is sufficiently close to a;*. 
For more information see e.g. [11] . 



4.2 The Grafimannian case 

Let {/UpjpgGrm „ be a family of local parametrizations of Grm.n. Let P* £ Grm,n be 
a nondegenerate critical point of the smooth function / : Grm.n R. If there exists 
an open neighborhood U C Grm,n of P* and a smooth map 

fi:Ux R'"("-™) Gr™,„ 

such that x) — ^p{x) for all P G [/ and X G R'"("-™) we will call {^J^p}peG■c,n.n * 
locally smooth family of parametrizations around P* . Let {MpjpgGrm „ s-nd {^^pjpgGrm 
be two locally smooth families of parametrizations around P* . Consider the following 
iteration on Grm,n 

Po G Gr„,,„, Pfe+i = up^ (^/OMP, (0)) (4.2) 

where N^^^p is defined in (|4.H) . The following theorem is an adaptation from [7], where 
it is stated and proved for arbitrary smooth manifolds. 

Theorem 4.1 Under the condition 

D MP* (0) = D //p- (0) (4.3) 

there exists an open neighborhood V C Gr^.n of P* such that the point sequence 
{Pk}keNo generated by (|4.2|) converges quadratically to P* provided Pq G V. 

Proof Let fi,!/ : U x R'"("~'") Grm.n be smooth and such that fi{P,x) = fip{x) 
and ^{P, x) = iyp{x) for all P eU and x G R™("-"), where U is a neighborhood of 

The derivative of the algorithm map 

S '. GTm.n ~^ GTrn.n^ 

P^y(^P^^ (Hess /o^ (P, 0)) V/o^ (P, 0)) ^^'^^ 

at P* is the linear map 

D s(P*) : Tp, Gr„i,„ ^ Tp. Gr™,„ (4.5) 
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defined by 

Ds{P*)h = Diu (P*, - (Hess/o^(P*,0))"' V/op(P*,0)) h 

+ D2 (P*, - (Hess/o;.(P*, 0))"' V/op(P*, 0)) (4.6) 

• (-Dp ((Hess/o^(P*,0))~' V/o^(P*,0)) . 

Here Dj(-)ft denotes the derivative of (•) witli respect to tiie i— tii argument in direction 
h, wfiereas, by abuse of notation, Dp denotes tlie differential operator to compute tlie 
derivative with respect to the argument P. 

The first summand on the right side in (|4.6|l is easily computed as 

Di u (P*, - (Hess/„^(P*,0))"V/o;.(J^*,0)) ft = Di i.(P*,0)/i = h, (4.7) 

which is true because P* is a critical point of / and the gradient therefore vanishes. 

The second summand in (|4.6p consists of two terms due to the chain rule. We first 
compute the left term giving 

D2i.(p*,-(Hessy„^(P*,0))-V;o^(P*,0)) =D2KJ'*,0) (4.8) 

because P* is critical. The evaluation of the right term is more involved. 

- Dp ((Hess^o^(P*,0))"Vyop(P*,0)) h = 

- (Dp ((Hess^o^(P*,0))"') h) Vjo;.(P*,0) 

=0 

- (Hess^o^(P*,0))"'Dp (Vyo^(P*,0))ft 

= - (Hessj„^(P*,0))"' Dp {Vjo^.{P*,0)) h. (4.9) 

By the definition of the gradient one has for any x G R™("-"') 

(V/o;.(P,0),x) =D/(P)-D2^.(P,0)-a; (4.10) 

and therefore using the critical point condition and the definition of the Hessian oper- 
ator in terms of second derivatives 

(Dp (V /o^(P*, 0)) h, x) =D\f{P*)- (D2 ^I{P*,0) ■x,h) 

+ D/(P*)Di (D2M(P*,0)-x)ft 

=0 

= D\f{P*)- {-D2^iiP*,0)■x,h) 

= D^f{P*)- (d2 /i(P* , 0) • a;, D2 KP* , 0) (D2 ^i{P*,0)y^h) 
= (Hessjo^(P*, 0)(i), (D2 KP*,0)y^ h) 
= (Hess /op(P*, 0) (D2 KP*,0)y' h, x) 

(4.11) 

We now can conclude 

Dp (V/o^(P*,0)) h = Hess/o^(P*,0) (D2 ^(f'*, 0))~' ft (4.12) 
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which in turn implies that 

(Hess/o^(P*, 0))"' Dp (V }of.{P\ 0)) h = (D2 ^i{P\Q))~^ h. (4.13) 
Summarizing our computations we have shown that 

D s{P*)h = /i - D2 iy{P*,0) (D2 tJ-{P*,0)y^ h 
= 0. 



(4.14) 



Consider now a local representation of s in coordinate charts around P* and s{P*) — 
P* . Let II • II denote any norm in the local coordinate space. By abuse of notation we 
will still speak of s, P* and so on in reference to their local coordinate representations. 
Using a Taylor expansion of s around P* , there exists a neighborhood Vp» of P* such 
that the estimate 

||s(P)-P*||< sup \\D''sm\-\\P~P*f (4.15) 
QeVp, 

holds for all P e Vp* . Therefore, the subset U C Vp* 

U~{P&Vp.\ sup ||d2s(Q)|| ■ ||P-P*|| < 1} 

is a neighborhood of P* that is invariant under s, and hence remains invariant under 
the iterations of s. This completes the proof of local quadratic convergence of the 
algorithm. □ 

A few remarks are in order. Geometrically, the iteration (|4.2[l does the follow- 
ing. The current iteration point Pj^ is pulled back to Euclidean space via the local 
parametrization np^, around P/^. Then one Euclidean Newton step is performed for 
the function expressed in local coordinates, followed by a projection back onto the 
Grafimannian using the local parametrization up^ around P^- 

For the special choice {fJ^p}p£M = {'^p}peM both Riemannian normal coordinates 
(cf. Section [2. 1.1 p . our iteration (|4.2p is precisely the so-called Newton method along 
geodesies of D. Gabay 5 , more recently also referred to as the intrinsic Newton method. 
This follows from the lemma below. 

Lemma 4.1 Let f : Grm,n — > K 6e a smooth function. For all P £ Grm.n, C ^ 
Tp GTm,n and any type € {exp, QR, Cay} we have 

V/o//p»(0) =0raO^(P) (4.16) 

and 

Hess^,^.^ype(0)(e) = S)t55fiP)iO- (4.17) 

Proof By Remark 12.11 the unique geodesic through a point P G Grm.n in direction 
£, G Tp Grm.ri is given by P{t) = l-i]^^{t^) and hence 

d 



((Sta()/(P),0)P= T^(,f°MpP)(e) 



= (V/o^oxp(o),e), (4.18) 



which by Proposition [23] implies (|4.16p for type = exp. The result for type — QR and 
type — Cay then follows from (|2.92p . By the same line of arguments (|4.17p follows 
from 

d2 



{{f,ZBBj{Pm),0)p = ^(/°MpP)(e) 



= {Hess/, ™(0)(C),0 (4.19) 
=0 



and Theorem [Ol □ 
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4.3 The Lagrange GraBmannian case 

All the above results carry over literally to Newton-like algorithms on the Lagrange 
Grai3mannian by substituting the respective formulas. 



4.4 Algorithms 

We conclude by presenting several specific instances of the resulting algorithms. We 
discuss the case of smooth functions F : Sym„ R with restriction / :— -FjGr,„ „ 
to the Grafimannian, as well as the special cases of the Rayleigh quotient function 
on the GraBmannian and the Lagrange GraBmannian. Furthermore, we consider the 
previously introduced nonlinear trace function for invariant subspace computations on 
the GraBmannian. In all cases we choose {pp} as the Riemannian normal coordinates 
and {jyp} as the QR-coordinates, see Sections 12. II and 13. II 

Recall that our convergence result requires the Hessian of the restricted function 
to be nondegenerate at the critical point. 

We first formulate a preliminary form of the algorithm we are interested in. 
Step 1. 

Pick a rank m symmetric projection operator of R", Pq £ Grm.n, and set j = 0. 
Step 2. 
Solve 

adp. Hessp(Pj)(adp^ f2j) — adp^. ady^j-p^. ) adp^. Qj — — adp^. \7p{Pj) 
for Hj G so{n). 
Step 3. 
Solve 

Pj = 

for 6>j G S0„. 
Step 4. 

Compute 

P,+i = ©7 - ad^p^. n,)0j)o,P,0j - ad|,^ n,)0j)'0,. 

Step 5. 

Set j ~ j + 1 and goto Step 2. 

Here the expressions in Step 2 result from applying Lemma 14.11 and Theorem 12.41 
and using the representation ^ = [P, G so{n) for an element ^ G Tp Grm,n. 

Inspecting the above algorithm, it is evident that it can be rewritten as an iteration 
in the 0j G SOn as follows. 



Im 




0, 
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Step 1. 

Pick an orthogonal matrix 0q G SOn corresponding to 



Im 6 




6*0 G Grm,n, 



and set j — 0. 
Step 2. 
Solve 



adp_. HesSi^^(Pj)(adp^ fij) — adp^. ady^j-p^. j adp^. fij — — adp^. Vp(Pj) 

for Qj G so(n). 
Step 3. 

Compute 

ej+i = ej (Oj {I - ad|,^. Oj)0j)^ and P,+i = 0j+, 

Step 4. 

Set j = j + 1 and goto Step 2. 



Irn 





Note that by equation (|2.80l) the term 



-7 ' 1 



(4.20) 



of which we have to compute a QR-factorization in Step 3 has a nice block structure 
that can be exploited to get an efficient implementation. Locally quadratic conver- 
gence is guaranteed as long as the specific QR-factorization used is differentiable, cf. 
Section [2T2] 

For specific functions the necessary computations might drastically simplify, as 
the following two examples show for the Rayleigh quotient function F : Sym„ 
E, F{P) = tr(AP), A G Sym„, cf. Corollary El 



Jf-.J)-.! Rayleigh quotient on the Graflmannian 

The equation we have to solve for Oj G so{n) in Step 2 becomes 



adp^ ad^ adp^. ilj = — adp. A 



which is equivalent to 



6>j (adp^. ad^ adp^ Qj)oJ = ©^(adp, A)oJ 



and, using 



P3 = e?. 



/m 




J' 



(4.21) 



(4.22) 



(4.23) 
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is equivalent to solving 



L oJ ' L oJ 



for € R™x(n-m) Denoting 





-Zj 



'All Ai2 

Ai2 ^22. 



L oJ 



we actually have to solve the Sylvester equation 

AuZj — ZjA22 = ^12- 

The resulting algorithm is exactly the algorithm presented in ffl. 



(4.24) 



(4.25) 



(4.26) 



Algorithm 1: Rayleigh quotient on the Grafimannian. 

Step 1. 

Pick an orthogonal matrix 0q £ SOn corresponding to 



Im 




6*0 G Gr,„^n, 



and set j = 0. 
Step 2. 

Compute 

Step 3. 

Solve the Sylvester equation 

for Zj G K™x(«-m)^ 
Step 4. 

Compute 



'All Ai2 
AJ2 A22. 



AiiZj ~ ZjA22 — Ai2. 



7^ T 

J-n — m 



and Pj+i = 0j+i 



Im 




^3+1 



Step 4. 

Set i = i + 1 and goto Step 2. 



Since the global maximum of the Rayleigh quotient function on the Grafimannian 
is a nondegenerate critical point, provided that there is a spectral gap after the mth 
largest eigenvalue of ^, we immediately get the following result. 

Theorem 4.2 For almost all matrices A G Sym„ Algorithm 1 converges locally quadrat- 
ically to the projector onto the m-dimensional dominant eigenspace. 
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4-4-^ Rayleigh quotient on the Lagrange Graflmannian 

The corresponding problem of optimizing the Rayleigh quotient function over the La- 
grange Grafimann manifold can be treated completely analogous to the approach above. 
Thus let A denote a real symmetric Hamiltonian matrix of size 2n x 2n. The Newton 
algorithm for optimizing the trace function tr(j4P) over LGn then is as follows. 



Algorithm 2: Rayleigh quotient on the Lagrange Grafimannian. 

Step 1. 

Pick an orthogonal matrix Oq € S02n corresponding to 



In 





©0 € LGn, 



and set j = 0. 
Step 2. 

Compute 



All Ai2 ' 
Ai2 -An 



0jAeJ. 



Step 3. 

Solve the Lyapunov equation 

AnZj+ZjAn = A^. 
for the symmetric matrix Zj € Sym„. 
Step 4. 

Compute 



= 



In Z j 
-Zj In 



and Pj+i = ^j+i 



In 





J+1 



Step 4. 

Set j = j + 1 and goto Step 2. 



Algorithm 2 is almost identical to Algorithm 1 on the Grafimannian, except for 
the simpler Sylvester equation that is indeed a Lyapunov equation here. Again, we 
immediately get the following result. 

Theorem 4.3 Algorithm 2 converges locally quadratically to any nondegenerate criti- 
cal point of the Rayleigh quotient function on the Lagrange Grafimannian. 



4-4-3 Invariant subspace computation 

We now turn to the more complicated task of solving the optimization problem of 
the nonlinear trace function tr((7 — P)APA'^) over the Grafimann manifold Grm^n- 

Herc A denotes an arbitrary real n x n matrix. This is interesting as it leads to a 
locally quadratically convergent algorithm by solving only linear matrix equations. 
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Our method requires only orthogonal matrix calculations and a linear matrix solver. 
We omit the straightforward calculations that allow one to compute the Newton step 
in terms of the linear matrix equation appearing in Step 3 of the algorithm. 



Algorithm 3: Invariant subspace function on the GraBmannian 


Step 1. 










Pick an orthogonal matrix Oq € SOn 


corresponding to 


Po = 


-el 





0' 




Oo € GTm,n, 


and set j = 0. 










Step 2. 










Compute 












'Axi A12 




-- OjAOj. 




A21 A22, 






Step 3. 










Solve the linear matrix equation 








An{AjiZj-ZjAj2)-{AjiZ^ 


— ZjA22)A22 


- aJi{zJ Ai2 + A2iZj) 


- {A12ZJ + ZjA2i)Aji = AJ1A22 - AuA^i. 


for Zj sRrnx(n-m)_ 










Step 4. 










Compute 










flT _ Im Zj 

L -fn-rnjQ 


and Pj+i = e]+i '^^ ° Oj+i 


Step 4. 










Set J = J ' + 1 and goto Step 2. 











We do not address the interesting but complicated issue how to solve the above 
linear matrix equation. Obviously one can always rewrite it as a linear equation on 
]^m(ra-m) ^^^^ then solve this, using matrix Kronecker products and vec-operations, 
by any linear equation solver. An alternative approach is to rewrite the equation in 
recursive form as 



AnXj - XjA22 = A2i{zJ_iAi2 + A2iZj_i) 

+ {Ai2Zj_i + Zj_iA2i)aJi - AJ1A22 + AuAJi (4.27) 
A\iZj — ZjA22 = Xj 

starting from e.g. Zq = 0. This system of linear equations is uniquely solvable if and 

only if the block matrices An and A22 have disjoint spectra. Once again, we immedi- 
ately get the following result. Recall, that an invariant subspace F of a linear operator 



36 



A : X ^ X is called stable, if the restriction A\y and corestriction operators A\x/V' 
respectively, have disjoint spectra. 

Theorem 4.4 Algorithm 3 converges locally quadratically to projectors onto stable 
invariant subspaces of A. 

5 Conclusions 

We presented a new differential geometric approach to Newton algorithms on a Grafi- 
mann manifold. Both the classical Grafimanniam as well as the Lagrange Grafimannian 
are considered. The proposed Newton algorithms depend on the choice of a pair of 
local coordinate systems having equal derivatives at the base points. Using coordinate 
charts defined by the Riemannian normal coordinates and Qfl-factorizations, respec- 
tively, leads to an efficiently implementable algorithm. Using the proposed method, new 
algorithms for symmetric cigcnspaco computations and non-symmetric invariant sub- 
space computations are presented that have potential for considerable computational 
advantages, compared with previously proposed methods. 
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